Mapping of Sequence Reads to the Reference Genomes ◾ 69
the quality string is not stored. Otherwise, it must be equal to the length of the sequence
in SEQ.
The alignment section of a SAM file may contain a number of optional fields. Each
optional field is defined by a standard tag accompanied with a data type and a value in the
following format:
TAG:TYPE:VALUE
The TAG is a two-character string. There are several predefined standard tags for SAM
optional fields. The complete list is available at “https://samtools.github.io/hts-specs/
SAMtags.pdf”. The user is allowed to add a new tag.
The TYPE is a single character defining the data type of the field. It can be “A” for the
character data type, “B” for general array, “f” for real number, “H” for hexadecimal array,
“i” for integer, and “Z” for string.
VALUE is the value of the field defined by the tag data type.
Notice that the last four columns in the SAM file shown in Figure 2.16 are for optional
fields identified by the four predefined standard tags: “NH”, “HI”, “AS”, and “NM”. The
“NH” tag shows the number of reported alignments (number of hits) that contain the read
TABLE 2.4 CIGAR Operations and Descriptions
Operation
Description
M
Alignment match, which can be a sequence match or mismatch
I
Insertion to the reference sequence
D
Deletion from the reference sequence
N
Skipped region from the reference sequence
S
Soft clip on the read (present in SEQ)
H
Hard clip on the read (not present in SEQ)
P
Padding (silent deletion from the padded reference sequence)
=
Sequence match
X
Sequence mismatch
TABLE 2.3 The FLAG Bitwise Decimal and Hexadecimal Numbers and Their Descriptions
Decimal
Hexadecimal
Description of Read
1
0x1
The read is paired
2
0x2
The aligner mapped the two pairs properly
4
0x4
The read is unmapped
8
0x8
Next segment in the template is unmapped
16
0x10
The sequence in SEQ is a reverse strand (minus strand)
32
0x20
The next sequence (SEQ) is a reverse strand
64
0x40
First read in paired reads
128
0x80
Second read in paired reads
256
0x100
The alignment is secondary
512
0x200
The read fails platform/vendor quality checks
1024
0x400
The read is PCR or optical duplicate (technical sequence)
2048
0x800
The alignment is supplementary